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V. A Context Editor and Data Retrieval 

Experience with the Dartmouth Editor has provided an interesting 
insight into the utility of such an editor as a general purpose 
information retrieval system. The Dartmouth Editor which was implemented 
cn the Dartmouth Time-Shared Computer System (GE 235) has two major 
portions. A line editor and a context editor. The former operates 
on line numbers, while the latter operates on the strings of characters 
contained m the lines. The latter has a provision for ignoring the 
line numbers when they are inconsequential. 

■An interesting feature or the Dartmouth Editor is that when it is 
c^ntid , any error in the rorm of an editing command causes it to 
print out a list of legitimate commands and to offer to explain any command 
in more detail. The instruction is provided at two levels. The first 
c-c.urs when something is typed which is not a legitimate editing command. 
Inis error results in the message in the upper portion of Figure Va. 

the command is correct out the parameters are not in the proper 
fora, the system reverts to a more detailed level, and prints out a 
description of the format of the specific comsand. 

The strings which the editor is instructed to find are delimited by 
any character which is not itself a part of the string. The string is 
enclosed between a pair of balanced characters - one in front and one 
behind the desired string. In the figures, which follow, the solidus 
is used quite uniformally »as a string delimiter. Many other characters 
would have served as well. 
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In each of the string commands , numerals are optional in front and in 

jj 

back of the string. A number in front designates a line number at which 
the search should start, while a number following the string indicates that 
it is the nth occurrence which is desired. Thus the instruction. $FIND 5(10/ 

S4ITH /3 $ LOCATE/ JONES /2 results in the following actions: | 

rlli 

a. The search starts in line 500. 

b. Pointers are placed around the third occurrence of the word 'WITH. 

c. The search proceeds from there until the second occurrence of JONES. 

d. The line containing the second occurrence of the word JONES is printed. 

e. The pointer is left on the third S4ITH in the file even if the 

rest of the search fails. 

This pointer can be restored to the beginning of the file with the instruction 
$ BEGIN. It should be evident from the figures that follow that the system 
accepts abbreviated forms of the editing commands. 

In the above discussion, the nimbers following the strings indicate the 
nth occurrence. They serve a different function in the $ REPLACE instruction. 

Here the instruction $FIND /WORK/2 $ REPLACE /TORKS/3 leaves the first two 
occurrences of the word 'Vork” intact and changes the next three ocurrences 
to ’Vorks." If we wished to change all of the verts "is" in a file to Vas, 
we should use the instruction: SFI /W/ $RE Z 1 WAS V 100. If there were 
more than 100 uses of is, the number could be increased accordingly. It is not 
necessary to know exactly how many there are as the Editor will change as many 

as it finds. 
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F UNCTIONS AVAILABLE: 

RESEfl'ENCEa DELETE^ EXTRACTa MERGE* WEAVE* PAGE* MOVE* 

LIFT, DUFLICATEa CATALOGa AND RUNOFF. 

STRING FUNCTIONS: 

3? EG I N'a SEND* SSUBSTITUTE* SNEGATEa SMULTI PLEa S.BREAK* 
SIGNORE, SABORTa SFIND* SSTRING* STRANSLATIONa STIME* 

SLI5T, SLOCATE* $ RUNOFF a SRSpLACE* 3INSERT* SMOVE* SDLiFLl CATEa 
STEXT AND SPRO GRAM 

FOR AN EXPLANATION OF A FUNCTIONa PRECEDE THE FUNCTION 
NAME WITH ft ’?’ ?? DELETE 


DELETE/* Nla N2-N3a NMa 


SINGLE LINES CN1 AND N^3 AND BLOCKS OF LINES CN2-N33 ARE DELETED. 
ALL ELSE IS SAVED. 

? ? F'XT^ACT. 


p ^ T P A C Ti M* f^2-N3* 


SINGLE LINES CNi AND N«4 3 AND SLOCKS OF LINES CN2-N33 ARE SAVED. 
ALL ELSE IS DELETED. 

? ? F F ? 


RESEQUENCE ) M a N2a N3 

" R "NUMBER? THE LINES OF A PROGRAM* ASSIGNING N1 TO LINE N2 AND 
INCREMENTING BY N3 . IF NO PARAMETERS ARE GIVEN 100a Oa 10 ARE ASSUMED- 


MERGE/ M* Ala Nla A2 a N2a . • « AJa NJa ... 

COLLECT'; FROM 2 TO 9 PROGRAMS UNDER CURRENT PROELEM NAME. PROGRAM 

AJ is' INSERTED INTO PROGRAM M AFTER LINE N J . IF NJ IS NOT GIVENa AJ 
IS PLACED AFTER THE LAST LINE IN M. 

THE NEW PROGRAM IS THEN RESEQUENCED. 

? ? WEAVE 


WFAVF7 Aa Ba Ca 


SIMILAR TO MERGE BUT FINAL PROGRAM IS NOT RESESUENCED • SEE MERGE. 


Figure Va. Following an error in input, the Dartmouth Editor system 
prints out the Information in the t*p part of the figure. It gives 
the entire list of editing commands available. The rest is a des- 
cription of certain editing operations on line-numbered files. 





ru Description of the conmands $FIND, $LOCATE. $REPLACE, and 

These commands are useful in editing and in fact retrieval. 





Editor. V< "* 0th6r edltln8 °P erations provided in the Dartmouth College 


FOR $F I /DENTWAS/ SRE /DENT WAS/ SEND 
WAIT. 

TIME: 0:02 

READY. 

FOR $F 3 /HAD NO/ SRE /HAD NO/ SEND 
WAIT. 

TIME: 0:02 

READY. 

FOR $F I /TOSHMAR/ SRE /TOSH MAR/ SEND 
WAIT. 

TIME: 0:02 

READY • 

FOR SF I /BURIEDHERM/ SRE /BURIED AT THE HERM/ SEND 
TIME: 0:02 

READY. 

FOR $F I /MON* V I R/ SRE /MOND* V I R/ SEND 
WAIT. 

TIME: 0:01 

READY. 

FOR SFI / i / SRE /J THEY DA—HAD / 10 SEND 
TIME: 0:02 

R EAD Y • 

FOR SFI /ANDMARTIN/ SRE /AND MARTIN/ SEND 
WAIT. ; 

TIME: 0:01 


SEND is required to make the changes permanently m the yortA gi 
Sdi LTff lived to make the corrections in the mam perman ent file. 
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3L0C / CA/ $LQC /CA/ 


LOCATING : 



I 

w 



643 =53399 

C8H4CL4F20 

CA62 

552 653457 

C3H4CL4F20 

CA62 


656 190 1 1 PHENOL, 0- { 2-AM INO- 4-ETHYL- 5-PYRIMI DYL )- CA63 

LOCATING ; 



90 THIS IS A PORTION OF THE CHEMI(fi^L REGISTRY FILE* 

301 BUTYL CHLORIDE, 1 -CHL0R03UTANE, N-PROPYL@RBINYL CHLORIDE 
363****.* EXCERPTS FROM THE CHEMIQ^L REGISTRY- 
401 ‘E M=CHLORO@RBONILIC ACID C10H12CLN02 
500 ’A BB-@ROTSNE C40H56 29-38 1 16325 

626 DECJkNE C10K22 ,95, WITH CH4,9S,AND C5H12,95 
648 653399 C3H4CL4F20 
652 653457 C8H4CL4F20 
656 1901 1 PHENOL, 0- ( 2-AM INO- 4-ETHYL- 5-PYRIMI DYL ) - CA63 

Figure Ve. The numbered lines were selected from a larger file because 
they contained the desired strings - in this case the letters C A or these 
letters preceded by a blank. Note how many more lines are retrieved 
when the blank is omitted from the string. 
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Applications of the Editor are not limited to the selection of single 
lines from a file. It can be applied to the selection of blocks of text or 
data by utilizing its facility to identify an "open string." 

An open string is one that starts with a particular string and ends 
with a particular string without regard to what, if anything, is between 
them. The last example in figure V f contains the instruction: $BE $L0 
/-3/ / 4Z/ . 

$BE resets the pointers to the beginning of the file. Next it locates 
a line in the file having the digit 3 preceded by a blank (beginning in 
this case with a 3) and terminating in a 4 (having a 4 followed by a carriage 
return, designated here by the letter Z) . 

The introduction of a comma between the strings in a SLOCATE instruction 
locates an open string regardless of the number of lines over which it extends. 
The figures which follow show a number of interesting applications of this 
feature of the Editor. Figure V i shows the application of the open string 
search to locating the "do loops" in a program. A do loop in BASIC starts 
with a FOR statement and ends with a NEXT statement. The command: $L0C 
/FOR I/, /A (I ,J)/, /NEXT 1/ would locate all do loops in the variable I which 
contained some reference to the array A(I,J). 

The examples used thus far were manufactured to illustrate the operation 
of the string search capability. While it is possible to infer from these 
examples how well the present system would work in general, it would be more 
interesting to see how useful these search tools are in retrieving facts from 
existing files which were not formatted especially for this system. 

The first of these is a file containing information on hydrogen sulfide 
as a poison. Figure V j. shows the arrangement of the file as stored on a lim 
numbered system, with little more structure than the fact that each mam segme t 
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has been made a paragraph and that the paragraph ends with the phrase 
END OF SYMPTOMS of END CF TREATMENT as appropriate. The next figure shows 
how the information on the treatment is extracted simply by asking for an 
open string starting with the word TREATMENT and ending with the phrase 
END OF TREATMENT . 

Another example of an existing file from which it would be possible to 
extract information is that containing a digest of Bills and Resolutions of 
a session of Congress. Figure V 1 shews a portion of such a published digest. 

The next figure shews the same information as stored on a conputer. Note 
hew similar . (except for the lack of lower case letters) the computerized 
file is to the published one. In the figures which follow, are given examples 
of the utility for data retrieval of the editing commands presently provided 
in the Dartmouth Editor. 

It is important to recognize that the Dartmouth College Editor was not 
intended for the kind of information or data retrieval to which it has been 
applied here. It should not be unreasonable therefore, to find some deficiencies. 
They are surprisingly few and quite easy to fix. The changes or, mors properly, 
the additions that need to be made to the Editor to remove these deficiencies 
are really quite simple and straight forward. A few of these' will be mentioned 
briefly at the end of this section. 
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FOR SSUB /Z/ SLOC /2Z/ SLOC / 6/ SLOC / 3/ /6Z/ 



LOCATING: 
” 6" I 



MORE? SBE $L0 / 3/ / AZ/ 


LOCATING: 

" 3" 1 
”4 
” 1 

1 A 



Figure Vf. Here the $SUB command equates the character Z with a 
carriage return. This permits one to select lines having a particular 
ending digit or string. See the next figure for the file from which 
these lines are selected. 
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CONSIDER THE FOLLOWING DIGITIZED PICTURE 
W HERE THE NUMBERS INDICATE DEGREES OF BLACKNESS 
LET US EXTRACT ALL LINES BEGINNING WITH 6> OR 
ENDING IN A 2> OR STARTING WITH 3 AND ENDING IN A. 


SYSTEM-- BAS 

NEW OR OLD--NEW: PICTURE 
READY. 


TAPE 
R EADY . 


1 

3 

4 

0 5 
0 6 

1 

3 


1 A 



0 22 


2 a 
2 5 
026 
0 27 
2 3 

2 9 
30 

3 1 


1 32334432 1 1 

1112233444421 
1 1 1 1233444321 
1 1 122223334332 
11122223333221 
1 1122223222221 
11 1 1 1 1222221 1 1 1 
123211 112212222222111 
135654211111223322222221111 

65777516222261555162222444 

6-777751666661 5333562224444 

157777351666613777351224444 

337777351 16661377735664444 
156666765543335677764321 1 
1356666666655444566654321 1 
1 346777766665544445554322 1 

25677777666665544444432111 
15777777777776665554433321 
25777777777777766555544321 1 
36777777777777776665555432 1 
36777777777777777766666542 

i 5777777777777777777766432 
2567777777777777777766431 
135777777777777777776641 
1356777777777777777641 
1246777777777777531 
123556677776542 
1 1 1223332! 


Figure Vg. 
extracted. 


The file from which the lines on the previous figure were 

It is a digitized picture from a file used for character recognition. 
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COMBO 


22:14 


1 



1 01 

THIS 

LITE 

BY 

CHEMISTRY 


1 02 

THIS 

LINE 

I S 

FROM 

THE 

PHYSICS FILE 

1 03 

THIS 

LINE 

I S 

F ROM 

THE 

LITERATURE F I L r . 

1 1 1 

THIS 

LINE 

IS 

FROM 

THE 

CHEM FILE. 


STOP. 

READY . 

FOR SSUR / RYE/ S>F I /XI/ SRE /X191/ 50 3' END 
WAIT. 

TIME: 0:02 

K F.ADY . 

LIST 


COMBO 22:16 


101 THIS LINE EY CHEMISTRY 


19102 THIS LINE IS FROM THE PHYSICS FILL 

19 103 THIS LINE IS FROM THE LITERATURE 1-ILL. 

19111 THIS LINE IS FROM THE CHEM FILL- 



STOF. 

RFADY . 

FOR $F I /101 THIS/ SRE./19101 THIS/ SEND 
WAIT. 

R FADY. 

LIST 

WAIT. 


COMBO 22:17 

19101 THIS LINE BY CHEMISTRY 

19102 THIS LINE IS FROM THE PHYSICS FILE 

19103 THIS LINE IS FROM THE LITERATURE FILE* 

19111 THIS LINE IS FROM THE CHEM FILE. 

19112 THIS LINE IS FROM THE PHYSICS FILE 

19113 THIS LINE IS FROM TE 


c- vh Another example of the use of the $SUB command. Note that 
here the X is defined as the carriage return and the replaoement^ 5 
made nominally 50 times to insure that it makes changes in 
the lines (except the first). 
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We have seen earlier how the instruction DELETE 210, 214, 215, 300-400 
erases lines 210, 214, 215 and the block of lines 300 to 400. On the other 
hand, the instruction EXTRACT 200-300, 400-500 is the complement of the 
DELETE instructions. Here everything in the file is wiped out except the 
two blocks between 200 to 220 and 400 to 500. 

In the string mode it is possible to extract from the file all lines 
containing a particular word or phrase or any sequence of characters. This 
is done via the instruction JLCCATE. Thus, SLCC / A JONES V will locate all 
the lines in the program or file containing the word JONES and print them on 
the Teletype. This operation is not really equivalent to the EXTRACT in- 
struction because the working file is not altered by the 5L0CATE instruction. 

A modification of the Editor to provide a string function parallel to the 
EXTRACT function, thereby making it possible to abridge a file from context, 
would go a long way to make the Editor into a very powerful information re- 
trieval system. Tne command might well be called ABRIDGE. Such a command 
can be used together with the LENGTH command to ascertain the size of the 
file which is residual after each use of the ABRIDGE command. 

There are two additional changes to the Editor which would extend its 
use to information retrieval even further. The changes are to the LOCATE 
command involving elliptical (open) strings. The command: SLOCATE /BOY/ /DOC/ 
now would print the line that says, 'Tne boy found his dog," but would not 
find the line that contained, "The dog bit the boy". A command in which the 
order is not crucial would be useful. Perhaps JPLOCATE would be a suitable com- 
mand. 
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Perhaps the most important and far reaching change to the Editor would 
be to have a command called SELECT as a variant of the open stTing command . 
SLOCATE /A/, /B/, /C/ which would yield not all of the strings that satisfy 
the search, but only the shortest. In terms of the strings A, B, C the sug- 
gested command SELECT should set a pointer at the first occurrence of A, but 
should move it to the second A if that occurs before string B etc. In the 
case of string B, however, duplicate occurrences may be allowed. The string 
would continue until the first occurrence of the string 0. Thus, in the 
sequence AAABBCCCC, only the string sequence ABBC would be selected. 

Tne utility of such a variant can be appreciated if we consider the 
problem of finding that paragraph in a file containing the words "ccrmodity 
prices." We would like to use the command: $ SELECT / /, /COMMODITY PRICES/, 

/ /. 4 As SLOCATE is presently operative, we get as many strings as there 

are paragraphs in the file ahead of the one we want. Figure VI o shows the 
results of such a search on a file of Bills before the Congress, as well as 
seme lines from Shakespeare. In this example we wish to use the indented 
spaces in each paragraph as the outside strings. It should be quite straight 
forward after setting a pointer at the first occurrence of the paragraph indent 
to move the pointer down one paragraph each time a new paragraph (a string of _ 
3 or 4 blanks) is encountered before the words "conmodity prices" occur. 
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